Cornering Stationary and Restless Mixing Bandits with Remix-UCB

نویسندگان

  • Julien Audiffren
  • Liva Ralaivola
چکیده

We study the restless bandit problem where arms are associated with stationary φ-mixing processes and where rewards are therefore dependent: the question that arises from this setting is that of carefully recovering some independence by ‘ignoring’ the values of some rewards. As we shall see, the bandit problem we tackle requires us to address the exploration/exploitation/independence trade-off, which we do by considering the idea of a waiting arm in the new Remix-UCB algorithm, a generalization of Improved-UCB for the problem at hand, that we introduce. We provide a regret analysis for this bandit strategy; two noticeable features of Remix-UCB are that i) it reduces to the regular Improved-UCB when the φ-mixing coefficients are all 0, i.e. when the i.i.d scenario is recovered, and ii) when φ(n) = O(n−α), it is able to ensure a controlled regret of order Θ̃ ( ∆ (α−2)/α ∗ log 1/α T ) , where ∆∗ encodes the distance between the best arm and the best suboptimal arm, even in the case when α < 1, i.e. the case when the φ-mixing coefficients are not summable.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stationary Mixing Bandits

We study the bandit problem where arms are associated with stationary φ-mixing processes and where rewards are therefore dependent: the question that arises from this setting is that of recovering some independence by ignoring the value of some rewards. As we shall see, the bandit problem we tackle requires us to address the exploration/exploitation/independence trade-off. To do so, we provide ...

متن کامل

Improving Online Marketing Experiments with Drifting Multi-armed Bandits

Restless bandits model the exploration vs. exploitation trade-off in a changing (non-stationary) world. Restless bandits have been studied in both the context of continuously-changing (drifting) and change-point (sudden) restlessness. In this work, we study specific classes of drifting restless bandits selected for their relevance to modelling an online website optimization process. The contrib...

متن کامل

A Survey on Contextual Multi-armed Bandits

4 Stochastic Contextual Bandits 6 4.1 Stochastic Contextual Bandits with Linear Realizability Assumption . . . . 6 4.1.1 LinUCB/SupLinUCB . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1.2 LinREL/SupLinREL . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1.3 CofineUCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1.4 Thompson Sampling with Linear Payoffs...

متن کامل

Discrepancy-Based Algorithms for Non-Stationary Rested Bandits

We study the multi-armed bandit problem where the rewards are realizations of general nonstationary stochastic processes, a setting that generalizes many existing lines of work and analyses. In particular, we present a theoretical analysis and derive regret guarantees for rested bandits in which the reward distribution of each arm changes only when we pull that arm. Remarkably, our regret bound...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015